Studio report: Linux audio for multi-speaker natural speech technology

نویسندگان

Charles FOX

Heidi CHRISTENSEN

Thomas HAIN

چکیده

The Natural Speech Technology (NST) project is the UK’s flagship research programme for speech recognition research in natural environments. NST is a collaboration between Edinburgh, Cambridge and Sheffield Universities; public sector institutions the BBC, NHS and GCHQ; and companies including Nuance, EADS, Cisco and Toshiba. In contrast to assumptions made by most current commercial speech recognisers, natural environments include situations such as multiparticipant meetings, where participants may talk over one another, move around the meeting room, make non-speech vocalisations, and all in the presence of noises from office equipment and external sources such as traffic and people outside the room. To generate data for such cases, we have set up a meeting room / recording studio equipped to record 16 channels of audio from real-life meetings, as well as a large computing cluster for audio analysis. These systems run on free, Linux-based software and this paper gives details of their implementation as a case study for other users considering Linux audio for similar large projects.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

LPFAV2: a New Multi-Modal Database for Developing Speech Recognition Systems for an Assistive Technology Application

In this paper we report on the acquisition and content of a new database intended for developing audio-visual speech recognition systems. This database supports a speaker dependent continuous speech recognition task, based on a small vocabulary, and was captured in the European Portuguese language. Along with the collected multi-modal speech materials, the respective orthographic transcription ...

متن کامل

A New Multi-modal Database for Developing Speech Recognition Systems for an Assistive Technology Application

متن کامل

Detecting audio-visual synchrony using deep neural networks

In this paper, we address the problem of automatically detecting whether the audio and visual speech modalities in frontal pose videos are synchronous or not. This is of interest in a wide range of applications, for example spoof detection in biometrics, lip-syncing, speaker detection and diarization in multi-subject videos, and video data quality assurance. In our adopted approach, we investig...

متن کامل

The Speakers in the Wild Speaker Recognition Challenge Plan

The Speakers in the Wild (SITW) speaker recognition challenge (SRC) is intended to support research toward the real-world application of automatic speaker recognition technology across speech acquired in unconstrained conditions. The SITW SRC will serve to benchmark current technologies in both single and multi-speaker audio with the dataset and annotations being made publicly available (under ...

متن کامل

Multi-speaker meeting audio segmentation

This paper presents segmentation of multi-speaker meeting audio into four different classes: local speech, crosstalk, overlapped speech and non-speech sounds. Firstly, Bayesian Information Criterion (BIC) segmentation method is used to pre-segment the meeting according to speaker changing points. Then, harmonicity information is integrated into acoustic features to differentiate speech from non...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2012

Studio report: Linux audio for multi-speaker natural speech technology

نویسندگان

چکیده

منابع مشابه

LPFAV2: a New Multi-Modal Database for Developing Speech Recognition Systems for an Assistive Technology Application

A New Multi-modal Database for Developing Speech Recognition Systems for an Assistive Technology Application

Detecting audio-visual synchrony using deep neural networks

The Speakers in the Wild Speaker Recognition Challenge Plan

Multi-speaker meeting audio segmentation

عنوان ژورنال:

اشتراک گذاری